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ABSTRACT 


To investigate the phylogenetic utility of entire, nuclear-encoded small-subunit (18S) ribosomal DNA sequences, 
we compared the rate of evolution and phylogenetic resolution of entire 18S sequences with those for the chloroplast 
gene rbcL using a suite of 59 angiosperms and 3 gymnosperms (Gnetum, Ephedra, and Zamia) as outgroups. For 
rbcL, 482 (33.6%) of the 1431 base positions were phylogenetically informative, whereas for 18S rDNA 341 (18.4%) 
of the 1853 positions were informative. Pairwise comparisons within the angiosperms show that rbcL is generally 
about three times more variable than 18S rDNA. However, because the 18S region is approximately 400 base pairs 
longer than rbcL, the ratio of the number of phylogenetically informative sites per molecule is only about 1.4 times 
greater for rbcL compared to 18S rDNA. Not only are sites more variable in rbcL than in 18S rDNA, but this 
variability is more evenly distributed over the length of rbcL. In contrast, 18S rDNA shows highly variable regions 
interspersed with regions of extreme conservation. Minimum-length Fitch trees were constructed for each matrix, 
and the results were compared to a tree derived from a previous global analysis of rbcL sequences based on 499 
seed plants. Parsimony analyses showed that several clades are strongly supported by both data sets, such as Gnetales, 
monocots, paleoherbs, Santalales, and various clades within Rosidae s.l. and Asteridae s.l. Some clades (e.g., Santalales) 
have higher base substitution rates for 18S rDNA, permitting the assessment of inter- and intrafamilial relationships. 
This comparative study indicates that 18S rDNA sequences contain sufficient information to conduct phylogenetic 
studies at higher taxonomic levels (family and above) within angiosperms. rDNA sequences are best applied to such 


deep divergences, but the amount of variation differs significantly among taxonomic groups. 


The major morphologically based angiosperm 
classifications proposed during the past 15 years 
show marked similarities, yet also differ in funda- 
mental ways. Recently, systematists have explored 
the utility of both DNA and RNA sequencing to 
resolve higher-level relationships within the angio- 
sperms, as well as seed plants in general (e.g., 
Hamby & Zimmer, 1992; Chase et al., 1993). 
The largest molecular phylogenetic study con- 
ducted to date (Chase et al., 1993) employed se- 
quence data for the chloroplast gene rbcL and was 
based on sequences for 499 species of angiosperms 
and other seed plants. The gene rbcL is typically 
1428 base pairs in length, and the advantages of 
using this gene in phylogenetic reconstruction have 
been thoroughly reviewed (e.g., Ritland & Clegg, 
1987; Palmer et al., 1988; Chase et al., 1993). 
These advantages include easy amplification via 
the polymerase chain reaction, essentially no in- 
sertion-deletion events, appropriate length and base 
substitution rate for inferring phylogeny at higher 


levels, and the availability of a set of sequencing 
primers (provided free of charge by G. Zurawski). 
Although some variation in the rate of rbcL se- 
quence evolution occurs from lineage to lineage 
(Bousquet et al., 1992; Gaut et al., 1992), unequal 
rates of evolution do not appear to be sufficient to 
obscure major phylogenetic relationships (Chase et 
al., 1993). Because of these numerous advantages, 
rbcL sequences now exist for over 1500 taxa (M. 
Chase, pers. comm.), making rbcL the most fre- 
quently sequenced protein-coding gene. During the 
past several years, the phylogenetic analysis of 
rbcL sequences has provided unprecedented in- 
sights into higher-level relationships in angiosperms 
and gymnosperms (e.g., Chase et al., 1993; Conti 
et al., 1993; Duvall et al., 1993; Michaels et al., 
1993; Morgan & Soltis, 1993; Qiu et al., 1993). 

Because most evolutionary studies are devoid of 
positive controls to prove or disprove particular 
events, the strongest support that can be obtained 
in phylogenetic reconstruction is congruence re- 
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sulting from analysis of multiple independent data 
sets (Miyamoto & Cracraft, 1991). Hypotheses of 
relationships are either strongly or weakly sup- 
ported based upon statistical tests involving the 
data themselves (e.g., consistency index, bootstrap 
values, decay values). The more numerous and 
more varied the data sets that corroborate a given 
relationship, the greater the support for that re- 
lationship. Although rbcL and, more recently, oth- 
er chloroplast genes such as matK (see Johnson 
& Soltis, 1995, this issue) and ndAF (see Olmstead 
& Reeves, 1995, this issue) have been shown to 
have great utility in phylogeny estimation, many 
workers have emphasized the need for comparison 
of chloroplast DNA-based phylogenetic trees with 
those from other sources, especially those based 
on sequences from nuclear-encoded genes (e.g., 
Palmer, 1985; Rieseberg & Soltis, 1991; Doyle, 
1992; Friedlander et al., 1992; Chase et al., 1993). 
At lower taxonomic levels (genus and below) com- 
parative sequencing of the nuclear internal tran- 
scribed spacer (ITS) region has shown tremendous 
potential for inferring phylogenies (see Baldwin et 
al., 1995, this issue) and has stimulated the com- 
parison of phylogenies based on chloroplast and 
nuclear DNA. At higher taxonomic levels, the phy- 
logenetic trees presented for angiosperms based on 
rbcL sequences (Chase et al., 1993) have similarly 
stimulated interest in conducting a comparable 
phylogenetic analysis based on nuclear gene se- 
quences. In this paper we explore the utility of 
entire nuclear 18S rDNA sequences for inferring 
phylogeny at higher levels within the angiosperms. 

In plants, ribosomes exist in the chloroplasts, 
mitochondria, and cytoplasm and are composed of 
a small and a large subunit, each of which contains 
rRNA and associated proteins. Although sedimen- 
tation coefficients vary slightly, plant nuclear small- 
subunit rRNA will be referred to here as the 185 
rRNA. The 18S, 5.8S, and 26S nuclear rRNA 
genes occur as a unit (cistron) separated by spacer 
regions. These cistrons are repeated hundreds to 
thousands of times in tandem arrays within the 
genome (Appels & Honeycutt, 1986). Ribosomal 
RNA cistrons are usually located in the nucleolar 
organizing region of the nucleus and may reside 
on several different chromosomes in plants 
(Thompson & Flavell, 1988). Sequence similarity 
between the individual cistrons within a single or- 
ganism is extremely high, possibly due to unequal 
crossing over during meiosis, gene conversion, slip- 
page, transposition, and RNA-mediated changes 
(Arnheim et al., 1980; Dover, 1982; Arnheim, 
1983; Dover, 1987). The homogeneity of ribo- 
somal RNA cistrons has been referred to as con- 


certed evolution (Brown et al., 1972; Arnheim et 
al., 1980; Zimmer et al., 1980). Ribosomal loci 
represent an extreme type of concerted evolution 
(with essentially complete homogenization), making 
them advantageous for reconstruction of deep phy- 
logenetic events (Sanderson & Doyle, 1993). Re- 
cent summaries of ribosomal RNA structure, func- 
tion, gene organization, and evolution have been 
presented (Jorgansen & Cluster, 1988; Hillis & 
Dixon, 1991; Hamby & Zimmer, 1992). 

Numerous low-molecular-weight (5S and 5.85) 
rRNA sequences now exist (see compilation by 
Specht et al., 1991), and attempts have been made 
to use these sequences in addressing the origin and 
evolution of green plants (Hori et al., 1985; Hori 
& Osawa, 1987). However, because these mole- 
cules are less than 200 bp in length, they provide 
a very limited number of phylogenetically infor- 
mative sites; hence, large numbers of equally par- 
simonious solutions often result when conducting 
studies using many taxa (Bremer et al., 1987). 
Specifically addressing 5S rRNA sequences, Mish- 
ler et al. (1988) summarized concerns for the use 
of rRNA sequences for phylogenetic reconstruction 
that apply to the 18S and 26S as well, such as 
cosubstitution in stem regions of helices, transition / 
transversion bias, alignment problems, different 
evolutionary rates, and homoplasy. 

Both large- and small-subunit ribosomal RNA 
sequences have been used to examine the very 
deepest branches among organisms, such as the 
domains Eukarya, Bacteria, and Archae (Wolters 
& Erdmann, 1986; Olsen, 1987; Woese, 1987). 
Ribosomal RNA sequence data have also been used 
to elucidate phylogenetic relationships in animals 
(e.g., Sogin et al., 1986; Field et al., 1988; Wada 
& Satoh, 1994), protozoa (Schlegel et al., 1991), 
algae (Bhattacharya & Druehl, 1988; Buchheim 
et al., 1990; Huss & Sogin, 1990; Kantz et al., 
1990; Hendriks et al., 1991; Chapman & Buch- 
heim, 1991), and fungi (Forster et al., 1990; Swann 
& Taylor, 1993). Prior to 1990, most rRNA se- 
quences were being determined from cloned ma- 
terial or by using Sanger dideoxynucleotide reac- 
tions and reverse transcriptase with rRNA templates 
(Lane et al., 1985). During the past several years, 
most workers have moved to direct sequencing of 
ribosomal DNA (rDNA) amplified via the poly- 
merase chain reaction (PCR; Mullis & Faloona, 
1987). The major reasons for the shift to DNA 
sequencing are: (1) rRNA is labile to RNases, mak- 
ing it methodologically difficult to extract, purify, 
and store; (2) rRNA secondary structure causes 
polymerase stalling, visualized as “hard stops" on 
sequencing gels, resulting in ambiguous sequence; 
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(3) when RNA is extracted from a tissue sample 
only RNA genes can be sequenced, whereas, the- 
oretically, any gene (nuclear, plastid, mitochon- 
drial) is available from total genomic DNA extracts; 
(4) DNA is easier to extract and is more stable 
than RNA; and (5) with DNA, both strands are 
available for sequencing, allowing more complete 
coverage of the molecule (as well as an opportunity 
to double-check each base position) by using prim- 
ers designed for both the coding and noncoding 
strands. 

The first 18S ribosomal RNA sequences of an- 
giosperms were of rice (Takaiwa et al., 1984), 
maize (Messing et al., 1984), and soybean (Eck- 
enrode et al., 1985). Later, the complete 185 
rDNA sequence of the cycad Zamia pumila was 
published; with the above three higher plants and 
several outgroups, a phylogenetic tree was pro- 
duced (Nairn & Ferl, 1988). Seven additional (par- 
tial) 18S rRNA sequences of members of the Po- 
aceae were later determined, and a phylogenetic 
analysis of this family was conducted (Hamby & 
Zimmer, 1988). 

Phylogenetic relationships in the parasitic plant 
order Santalales were examined by Nickrent & 
Franchina (1990). This study was the first since 
the work by Nairn & Ferl (1988) to use essentially 
complete 18S rRNA sequences in a phylogenetic 
analysis. Sequences from 13 angiosperm species 
representing 10 families were analyzed, and one 
most parsimonious tree was obtained that supported 
the monophyly of the order Santalales, confirmed 
the basal position of Olacaceae within the order, 
and showed Viscaceae to be derived from Santa- 
laceae. This study indicated that sufficient infor- 
mation exists in complete 18S rRNA sequences to 
allow phylogenetic comparisons to be made at the 
family level and above. 

Despite the phylogenetic promise of these initial 
analyses, relatively few 185 rRNA sequences were 
determined in the years that followed, perhaps due 
in part to the tremendous interest in rbcL sequenc- 
ing for inferring phylogeny at this same level. As 
a result, the phylogenetic potential of 185 sequence 
data remained unexplored. A few entire sequences 
were published, including Alnus glutinosa (Savard 
& Lalonde, 1991), Arabidopsis thalliana (Un- 
fried et al., 1989), Lycopersicon esculentum (Kiss 
et al., 1989), and Sinapis alba (Rathgeber & 
Capesius, 1990). A sequence for Fragaria x an- 
anassa (Simovic et al., 1992) exists, but contains 
a large number of base changes atypical of other 
Rosaceae (and was therefore not included in the 
present study). Several studies, however, explored 
the phylogenetic potential of partial 185 sequences. 


For example, Troitsky et al. (1991) used five dif- 
ferent rRNA molecules (including nuclear 18S 
rRNA) to examine the early evolution of seed plants. 
For 18S rRNA, 21 sequences representing 256 
bp (from position 499 to 755 on Glycine) were 
used. Six dicots and eight monocots were included, 
as were Ephedra and Gnetum, two cycads, two 
gymnosperms (Podocarpus and Taxus), and Ly- 
copodium. Two conclusions of this study were that 
the divergence of angiosperms from gymnosperms 
occurred before the early Carboniferous, i.e., at 
least 360 million years before present, and that 
the Gnetopsida are not monophyletic. Given the 
small number of base pairs used and that no sta- 
tistical support for the clades was provided, these 
results must be viewed with caution. More recently, 
Chaw et al. (1993) used 18S rDNA sequence data 
to demonstrate support for the placement of Taxus 
in Coniferales; however, only four sequences (for 
Taxus, Pinus, Podocarpus, and Ginkgo) and an 
outgroup (Zamia) were used in the analysis. 

By far the largest analyses of 185 sequences 
have been undertaken by Zimmer and her collab- 
orators, who conducted phylogenetic studies using 
direct sequencing of rRNA from approximately 60 
vascular plant species (Zimmer et al., 1989; Ham- 
by & Zimmer, 1992). Their efforts toward pro- 
ducing a molecular phylogeny of the angiosperms 
were based on the sequencing of a portion of the 
small- (18S) and large- (26S) subunit rRNA mol- 
ecules. Hamby & Zimmer (1992) used a total of 
1701 base positions per taxon (1097 base positions 
from the 18S region and 604 positions from the 
26S region) in a phylogenetic analysis of seed plants 
that included 29 dicot and 17 monocot genera. 
Two shortest trees were found with a large number 
of equally parsimonious solutions one or several 
steps longer. The shortest trees had a number of 
features in accord with various existing classifi- 
cations, such as the presence of a monophyletic 
Gnetales clade as sister to the angiosperms, and 
the basal position within the angiosperms of several 
Magnoliid groups, such as Nymphaeaceae and Pi- 
peraceae. Sampling within nonmagnoliid groups was 
sparse, however, which could explain the unusual 
relationships suggested among more derived an- 
giosperms (e.g., the presence of a clade composed 
of Ranunculus, Duchesnea, Spinacia, and Stel- 
laria). Because many of the interior and basal dicot 
nodes were poorly supported in the rRNA tree, 
systematists remained unsure of the utility of ri- 
bosomal RNA sequences for resolving questions of 
angiosperm phylogeny. More recently, relation- 
ships among the tribes of Onagraceae were ex- 
amined by Bult & Zimmer (1993) using partial 


Volume 82, Number 2 
1995 


sequences of 18S and 26S rRNA. Although rela- 
tively few phylogenetically informative sites were 
found, several relationships were in accord with 
those revealed by rbcL analysis (Conti et al., 1993). 

Using the same primers as the Zimmer group, 
Martin & Dowd (1991) obtained partial 18S and 
26S rRNA sequences for 12 angiosperm species 
from 7 families. Their purpose was to assess the 
relative merits of phylogeny estimation using ri- 
bosomal sequences with those derived from rbcL. 
The authors concluded “both phylogenetic trees 
gave good grouping within families but in neither 
case was there resolution of the branching order 
of major taxa...." The authors further stated 
"that neither macromolecule alone was likely to 
yield a solution to the problem of angiosperm phy- 
logeny and therefore that studies of both, at least, 
will be required." However, aside from two familial 
placeholders, only two species (maize and rice) were 
shared by the two data sets; furthermore, taxon 
density was clearly a limitation in making any state- 
ments regarding the branching order of major taxa. 
Additional studies using partial 185 rRNA sequenc- 
es include analyses of six angiosperm families (Boul- 
ter & Gilroy, 1992), Papilionaceae (Martin & Dowd, 
1993), and Byblidaeae-Roridulaceae (Conran & 
Dowd, 1993). The latter study examined the phy- 
logenetic placement of two carnivorous genera, 
Roridula and Byblis, that have been variously 
classified using morphological and chemical char- 
acters. The analysis of partial 185 rRNA sequences 
from these genera and 26 other angiosperms sup- 
ported the position of Roridula in the lower Ros- 
idae and the placement of Byblis in the Asteridae, 
results in agreement with rbcL sequence analysis 
(Albert et al., 1992). 

The major goal of this study was to explore in 
more detail the potential of entire 185 rDNA se- 
quences for inferring phylogeny at higher levels in 
the angiosperms. We wanted to understand better 
the rate of evolution and distribution of base sub- 
stitutions of 18S rDNA compared to the widely 
used chloroplast gene rbcL. To accomplish these 
objectives, comparable 18S and rbcL sequence 
data sets were constructed for a similar suite of 
59 angiosperms and 3 gymnosperms. Minimum- 
length Fitch trees were constructed, and relation- 
ships as well as evolutionary rates were compared 
for the entire 18S gene and rbcL. These phylo- 
genetic analyses were not meant to be exhaustive 
studies of angiosperm relationships; we recognize 
that taxonomic density is a limitation in this study. 
Rather, our goal was to assess the relative merits 
and attributes of each molecule through direct com- 
parison. This study differs from previous compar- 
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ative analyses of 18S and rbcL sequences (e.g., 
Martin & Dowd, 1991) in that essentially complete 
sequences of both genes were used, taxon sampling 
was more extensive, and, in the majority of cases, 
the same taxon was sequenced for both genes. 


MATERIALS AND METHODS 
TAXON SAMPLING AND SEQUENCE ACQUISITION 


Given the large, taxonomically diverse array of 
rbcL sequences that is already available, taxon 
inclusion for this study was determined mainly by 
the availability of 18S rRNA or rDNA sequences. 
We therefore determined the 185 rDNA sequences 
of additional plant taxa to provide greater coverage 
of the major clades identified in the global rbcL 
analysis of Chase et al. (1993). The rbcL sequences 
were chosen to correspond at the specific or, sec- 
ondarily, the generic level to an available sequence 
of 18S rRNA. Some rbcL sequences included in 
Chase et al. (1993), but not deposited in Genbank, 
were kindly provided by Mark Chase (Brassica, 
Pachysandra, Pisum, and Impatiens). An rbcL 
sequence of Hydrocotyle that was not included in 
the Chase et al. analysis was also kindly provided 
by G. Plunkett. The original rbcL sequence for 
Zea contained errors; hence the newly determined 
sequence (Gaut et al., 1992) was used herein. In 
addition to the three published rbcL sequences of 
Santalales (Morgan & Soltis, 1993), 12 other rbcL 
sequences were determined to increase sampling 
within this one order, thereby allowing phylogeny 
comparison of more closely related species using 
both molecules. A data set of 62 taxa was ultimately 
identified for which sequences of both molecules 
were available for use in this study (Tables 1 and 
2). Of the 62 sequence pairs (18S rDNA and rbcL) 
used herein, 37 were from the same species; an 
additional 15 sequences were from different species 
within the same genus. Different genera were used 
for ten families to allow a broader sampling within 
the angiosperms (rbcL/185 rDNA): Betula/ Al- 
nus, Pachysandra/ Buxus, Convolvulus/ Cuscu- 
ta, Polemonium / Gilia, Pisum / Glycine, Lamber- 
tia / Knightia, Reinwardtia/ Linum, Byrsonima / 
Malpighia, Pyrola / Monotropa, and Mahonia / 
Podophyllum. Although members of three of the 
above generic pairs are classified in separate fam- 
ilies (i.e., Cuscutaceae /Convolvulaceae, Monotro- 
paceae/Pyrolaceae, Podophyllaceae/Berberida- 
ceae), they were deemed to be related closely enough 
(based upon traditional classifications) to be used 
as placeholders. Following Cronquist (1981), 47 
angiosperm families are represented in this study. 

Seven of the 62 18S rDNA sequences used in 
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this study were previously published by workers 
other than the authors (Table 1). With the excep- 
tion of six sequences that were obtained via direct 
sequencing of rRNA using reverse transcriptase 
(Nickrent & Franchina, 1990), all of the remaining 
rDNA sequences were derived from PCR products. 


AMPLIFICATION AND SEQUENCING 


The genomic DNAs used for amplification and 
sequencing of 185 rDNA and rbcL were extracted 
using a modification of the hot CTAB method (Doyle 
& Doyle, 1987; Nickrent, 1994). Plant samples 
were derived from either fresh, silica gel-dried, or 
herbarium material. The PCR protocols employed, 
as well as the oligonucleotide primers used for the 
amplification and subsequent sequencing of rDNA, 
are provided in Nickrent (1994) and Nickrent & 
Starr (1994). The general PCR strategy and am- 
plification primers used for rbcL are provided in 
Morgan & Soltis (1993). For both rbcL and 18S 
rDNA, the first author employed direct sequencing 
of double-stranded PCR products. These products 
were prepared for sequencing by gel purification 
whereby the PCR bands are bound to DEAE mem- 
branes, eluted, and precipitated in ethanol. DNA 
so prepared is denatured at 100°C and snap-cooled 
for primer annealing. In contrast, the second au- 
thor used each of the two PCR primers individually 
to generate single-stranded DNA from the double- 
stranded PCR products. Single-stranded 18S and 
rbcL DNAs were then purified by precipitation with 
20% PEG/2.5 M NaCl, as described by Morgan 
& Soltis (1993). In all instances, the chain-ter- 
mination method of sequencing was employed 
(Sanger et al., 1977) using [*S] dATP and the 
Sequenase® version 2.0 kit. DNA fragments were 
separated in 6% acrylamide gels; gels were sub- 
sequently fixed and used to expose film using stan- 
dard techniques. Compressions and other struc- 
ture-related artifacts were resolved either through 
the use of alternative nucleotides (deaza-dGTP, 
dITP) or by sequencing the same region on the 
complementary strand. 


SEQUENCE FEATURES AND MULTIPLE ALIGNMENTS 


All alignments were initially conducted on a SUN 
Spark Station running the Genetic Data Environ- 
ment (GDE, version 2.2; Smith, 1992). These 


FIGURE 1. 
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alignments were downloaded to a Macintosh com- 
puter and directly imported into MacClade (version 
3.01; Maddison & Maddison, 1992). For each 
molecule, the chart features of MacClade were used 
to examine patterns of variability and conservation, 
transition/transversion bias, and (for rbcL) the 
number of changes per codon position. This pro- 
gram was also used to determine the number of 
phylogenetically informative sites for each align- 
ment. MacClade files were saved in Nexus format 
and then imported into PAUP (version 3.1; Swof- 
ford, 1993) for parsimony analyses. Files contain- 
ing the complete alignments of both molecules are 
available from both authors by sending a formatted 
3.5-inch diskette. 

The alignment of rbcL sequences is straightfor- 
ward and can be accomplished easily by sight be- 
cause very few length mutations exist. In contrast, 
rDNA sequence alignment was performed as an 
iterative process that simultaneously dealt with 
phylogenetic relationships, compensatory muta- 
tions, and higher-order structure. The higher-order 
rRNA structure of Glycine max (Fig. 1), like the 
recently proposed structure for Rafflesia keithii 
(Nickrent & Starr, 1994), is similar to the one 
given for maize by Gutell et al. (1985) but includes 
the structural changes proposed for yeast (Gutell, 
1993) and eukaryotes (Neefs et al., 1993). This 
structure differs somewhat from that proposed by 
Senecoff & Meagher (1992), which was based 
largely on a mammalian model. The soybean struc- 
ture given here was used as a reference for ex- 
amining structural variation in the other plant spe- 
cies examined and as a guide during the alignment 
process. 

Until recently, the secondary structure of rRNA 
made direct sequencing of this molecule very dif- 
ficult and aroused some criticism over the utility 
of rRNA for phylogenetic reconstruction in plants. 
With the advent of PCR, the amplification and 
sequencing of rDNA is no more difficult than for 
other genes (Nickrent & Starr, 1994). Further- 
more, secondary structure provides much-needed 
corroboratory information regarding base pairing 
and compensatory base changes that is essential in 
producing alignments that reflect proper base ho- 
mology. 

The 188 rDNA sequences obtained were ap- 
proximately 1770 base pairs in length. Published 


— 


Proposed secondary structural model for the small-subunit (185) ribosomal RNA of Glycine max. The 


primary sequence of soybean was determined by Eckenrode et al. (1985). This structural model follows the general 


models proposed by Gutell et al. (1985) for Zea, Gutell (1993) for Saccharomyces, and Neefs et al. (1993) for 


Volume 82, Number 2 


199 


5 








Nickrent & Soltis 
18S rDNA and rbcL Phylogenies 
Compared 


c atg39 446, 42 , ç 1358 
ufc } 
" : 6 au “A M Sac ee TON CURA AKU AV 
ag-a*"ucuucgatuuaucaaaciccan® ta fg cO a 2. A, yVEGUCREUUG  GEAGUY COA GAUL, UCR Cé 
ba '' 141 * - 
Ag u GAGACU GAUACUUUAUGCUU yy ACG A^ SO QU-A wn 
ote A- U "oce UG QU x 
' c-G 37 U'G yz u-C V7 
QUU G-¢ GAG U-A 6-C. GU 
d 6 Fe A x A. 6 £ C Ñ 
U C GA 27 y^ y, 1259 GC pa Ex 
n HIE AGA cate 40e 7j £-6 
"Mg get Ore E 4 iw yg 6 CE -1400 
u. h Š za 
GG coe 24 Goa A C~ CA C9, i ^ G 
u G ~C- 6-858 A-U- G-C G G 38 c- A- U uC 
Ka — EEE c-6 bud 7 vo; a v 
s G^ U — — A-U ecd U U2 ME 
E23-8 "VL: G-C G-U K G= GG A JU wie 44 
seed kan u. GU é £ mM Ly $-c 
U C re -G-U c U UA gUG Uy 
A C A C A y^ ,U G A 
‘A 1 UsG c À V A^ 
C a LZ CA GGA-U A-U Gc. y c s 
^ Sy Ug Aca G- C- A A-1450 
A Aa GAUU y ri vy? — 
U £ u Ac c= 
U^c Gcutus UU Cc A Gcu// yc G-C UN £y 
"utl v4 \ abl ets OO RE V5 wait si 
ae À U ; hues aG-CCAy, —B 36 =n 
" c G - 
123-7 fç hg 238 t A A- geo 7208 C-6 1500, 
u’ CAU ps Cc! ese, CS, C AC G C C 
ANN UU Cc^ y — A 6-8 `C GGG'AAAUUA G AG 
AG NAT u u gis Coe C — a cle PIT Ing c- por 
aut. Sul 75 Guez ü c V ‘ C Aue » e dU. NU ACUCAqUUUAMAU C Š its š 
B Cc C-6 ; 
&- "Nu eri Gi Sa n v 28, —X 35 ene MES Ë 
AGA -C ru A CAG A sale 
# xe -700 Suet Sek a — RTH ASEEN G-c Co SAU, 
E23.6 t$ E23-1 "puc tus cube Cue, VÀ att Ax 
WT D CA^ SG A. c> A* ME CN UU 
V (E232 oye Su (-6 22 UU ii na^ 30 QUE 45 uU Ngu V8 
C A d G c6 € ^ DA DE UA NAA 
G.u 20 | U c UG u G =U — NDS 
UG (A 6. “QG Wag Aya 6-c y, U 
G-c y “us G A-UACAC GA V C UU 
A A c A 
C-G t v GAQU' UGGUCÓC G v Rco 
m te AU G . w Vue 31 "ee 
AMPUCE =À UU CA, GUCGG,A G 
-6- c u `. Gu 
G-c a2 1156 32 A ASSN G 
C C x” b3 AG GAAUUG AC GG rae eA GEN uA A 
C-6G A A U. ur tide ed AREE EA — u 
E23-5 G U CA C 6 6 ig-c h OT SUC AGUGU UUCCCGU uc SG ac G 
6-c G- A-U M UUGGU A [d A a'g 
u c 185 A 19^ C AUS AL C b G c c — Aas 
c. U u 6-C ` I -A c vy AU 
U G eee UL U A UG ACA C 27275,1550 
NER ^ 6 550-0- A reas u Weil f cA vei 
A = 3 É; gk 46 ah 
sga ç Bae EA i se^ Ure EST 
UcAGucuGGuA m DuG "CC. AC © — — U 6g Deb 
17 y ii s le d WS. od sd u CA u Gc 7 AU 
ACUCGGGCCAU AACAAUAM., y € LU. eee rete en CE Se un etg 
' !'yAs 6 C-G ^ GAAGGCGUCCAM ^c. ^c 1600 U c 
AC AA. r: yo fy G-1800 VU c G A 
acy A G s t. 36 4 Ç a Uus a 
Hat 3 ^ o AME Acu at 
AgUCCU CCAUUA A- —— 48 
+ Chet TEE ; 
G. vi — om Glycine max 
G A E 
C-GA = 
G-C 50 Ct 
15 ag 6-C, -AnG 
zA an En: 5 wo 
Cc P^ G. cAG-CA V1 G+u-1758 
U (6 u 6 LA A-U 
ake pO LETT MN" dei te 
14 GGCUACC VEVA EY- UTAR U | 
“agut G, aa e TT t 
49 ^ kŠ. A "es d 
Gen V Ae v C - c 
c guy Uc CG eye G G C-C- 
13 “thd Ad it `C ee 
AyUAAG AGGC À U-A 
š e^ — "v hee 
u A A-U 
<. G, A = 
cAc Ge co- cu^ dcs 
v “cCA 6-Cc 
G u C-Gy u 
y n Mr" Fa 
350, G.A A CU u-al A 
G, K y ATI u-aG 
uu - š 
eS A-U 8 mn 
cost Ge A-U C-6 
ANG ue % Gu 
USG, Gru 
` 3@-u U 
6 cS cU y A-U Ezg- 
P OW ies Ta C-G A. A 
— — € VA c-G 
Cau ci oa A-U i G-¢ 
12 A Ad C-6G UA, jt 
11 Cç G C c * 
on u 170-A. c 
y : cf V9 
^ x 
u š 
G 9 G-U 
é Jm AAC c oS U ^ 
° uia NO 
G GyGCAUA AUC v 
c c A M 49 
Uc A Ucc 
T A 
A 
Wr É 
A GNU ACGA 
C U. U 299-U - A- 
259, ,U, ‘AY A C 
AEN 6 S G-c 10 
AUS S G-C 
6, PCM A6 7C, 
cu. ^ E10-1 s A 
wh so G ^ 
uA "cu Vou 
C Cy. ACC D 
G NX 
^ V2 
c 
u A 
fos 


217 


eukaryotes in general. Helix numbering corresponds to Neefs et al. (1993). The structure for helix 6 (V1 region) 
follows Gutell (1993), with the alternative interpretation according to Neefs et al. (1993). The structure for the V4 
region follows Nickrent & Sargent (1991). For an alternative model of the V4 involving a pseudoknot between helices 
E23-8 and E23-9, see Neefs et al. (1993). The V6 region is absent in eukaryotes. Tertiary interactions are indicated 
by thick lines. 
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complete 18S rDNA sequences of higher plants 
vary in length from 1800 to 1813 base pairs (mean 
of 1807 base pairs); thus, in this study, 97.9% of 
the total length of the molecule was obtained for 
most taxa. Owing to alignment spacers (*‘-’’), the 
total length of the matrix was 1853. Insertion/ 
deletion events (indels) were treated as missing 
data. Certain regions of 18S rDNA are variable in 
primary sequence and length, such as the termini 
of helices E10-1, E23-1, and 49 (Fig. 1). These 
regions confound unambiguous alignment; hence 
positions 227-239 and 676-685 on the alignment 
(equivalent to sites 224-232 and 664-673 on the 
Glycine molecule) were eliminated from analysis 
as suggested by Swofford & Olsen (1990). Those 
base pairs corresponding to the 25e forward 185 
rDNA PCR primer (positions 1-20) were removed 
from the analysis. Similarly, those base pairs cor- 
responding to the 1769 reverse PCR primer (sites 
1810-1853 on the alignment, 1764-1807 on 
Glycine) were eliminated. With the exception of 
the excluded base pairs, alignment of the 185 rDNA 
sequences was straightforward because most length 
mutations involve single base insertions or dele- 
tions. 

The total length of the rbcL data matrix was 
1431. However, the first 30 base pairs were not 
used, because, after amplification, this portion of 
the gene is identical to the Z1 forward amplification 
primer. Sequences of Nymphaea, Houttuynia, and 
Ranunculus were incomplete (see Table 2); '*?" 
was used to indicate missing sites. 


PHYLOGENETIC ANALYSES 


Minimum-length Fitch parsimony trees were 
constructed using PAUP version 3.1.1 (Swofford, 
1993) with MULPARS and TBR branch swapping. 
Given the number of taxa (62), only heuristic search 
strategies could be employed. Both data sets were 
analyzed giving all changes equal weight. Trials 
using the character-state transformation weighting 


model of Albert et al. (1993) for the rbcL matrix 
and a transformation matrix encompassing a 10:1 
bias favoring transitions over transversions for the 
18S rDNA data gave similar results as trials with 
equal weighting. To determine whether multiple, 
equally parsimonious *'islands" of most parsimo- 
nious trees exist (Maddison, 1991), 100 replicate 
searches with random taxon addition were con- 
ducted. To obtain estimates of reliability for mono- 
phyletic groups, bootstrap (Felsenstein, 1985) 
analyses (100 replicates) were conducted. For both 
data sets, the bootstrap analysis was performed 
using simple taxon addition, TBR branch swapping, 
ACCTRAN character-state optimization, and un- 
weighted characters. 

The phylogenetic trees derived from the global 
rbcL analyses of Chase et al. (1993) were used to 
construct a “reference tree" for the subset of taxa 
used in this study (Fig. 2). This reference tree was 
also constructed to assess the effect of taxon sam- 
pling and density on the stability and composition 
of various clades as determined by the present 
analyses. The Search II strategy employed by Chase 
et al. (1993) resulted in 3900 shortest trees, one 
of which was chosen at random and depicted in 
their figure 2B. Search II was preferred by the 
authors (over their Search I) because it included 
a greater diversity of taxa, was able to save more 
trees of shortest length, and did not use relative 
weighting of character-state transformations. In 
figure 2B of Chase et al. (1993), the angiosperms 
were divided into 19 major groups, the composition 
of which varied from single genera to groups of 
many families. Owing to lack of an 18S rDNA 
sequence, the reference tree constructed here does 
not include four of the major rbcL clades: Cera- 
tophyllum, Gunnera, Laurales, and Asterid V. The 
Asterid V clade was not in the Search II topology 
of Chase et al. (1993), but Asterid V included 
Santalales in the tree resulting from Search I. It 
is well-known that sampling affects tree topologies 
(Felsenstein, 1985); the reference tree of Figure 





FIGURE 2. 


— 


The “reference tree" constructed from the topology found in tree 2B in the global rbcL analysis that 


included 499 taxa (Chase et al., 1993). This tree represents a null hypothesis that assumes rbcL is insensitive to 
taxon sampling, i.e., all topologies using fewer taxa are fully concordant with the global topology. Instances where 
two generic names are given represent cases where different generic representatives of a family were used (rbcL 
taxon first followed by 18S rDNA taxon in parentheses). Those taxa in italics were not included in the study by Chase 
et al. (1993); their placement on the tree is derived from the analysis reported here (e.g., Santalales) or based upon 
traditional familial classifications (e.g., Hydrocotyle, Apiaceae). The names of the major angiosperm clades (right 
side) correspond to Chase et al. (1993). Taxa marked with an asterisk (*, Santalales and Paeonia) were located on 
the Asterid V clade in Search I of Chase et al. (1993). 
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2 should therefore be interpreted as a null hy- 
pothesis that assumes rbcL is insensitive to taxon 
inclusion and that the topology of a restricted anal- 
ysis is congruent with that of a global analysis. 


RESULTS 
GENERAL FEATURES OF rbcL AND 18s rDNA 


The length of rbcL is highly conserved in higher 
plants with few insertion/deletion events reported 
(Chase et al., 1993). Positions 1426-1428 form 
the most common stop codon, although longer 
reading frames up to 1458 bp have been reported 
in Asteraceae (Kim et al., 1992). Among the taxa 
analyzed herein, a single insertion of three bases 
occurs in Zea beginning at position 1404, whereas 
all other full length rbcL sequences used herein 
are of length 1428. For rbcL, 482 (33.6%) of the 
1431 base positions are potentially phylogeneti- 
cally informative. The length of complete 18S rDNA 
also varies: 1800 bp (Lycopersicon), 1804 bp 
(Brassica), 1807 bp (Glycine), 1809 bp (Zea), 
1812 bp (Oryza), and 1813 bp (Zamia). Of the 
1853 positions for the 18S rDNA alignment, 341 
(18.4%) are potentially phylogenetically informa- 
tive. 

When one compares sequences of two distantly 
related taxa, for example, Zamia and Pisum (Fa- 
baceae) for rbcL and Zamia and Glycine (Faba- 
ceae) for 18S rDNA, the rbcL sequence compar- 
ison yields 191 mutational differences (13.3% of 
the 1431 sites), whereas comparison of 18S se- 
quences yields 138 differences (7.6% of the ca. 
1810 sites). Similar comparisons using angiosperms 
show that rbcL is generally about three times more 
variable than 18S rDNA. For example, comparison 
of Pisum and Spinacia rbcL sequences demon- 
strates that 139 of the 1428 sites (9.7%) are 
different. In contrast, a similar comparison of Gly- 
cine (Fabaceae) and Spinacia 18S rDNA sequenc- 
es indicates that only 62 of the ca. 1808 (3.4%) 
sites are different. Not only is the rate of evolution 
of rbcL considerably higher than that of the 18S 
gene (about 3 times faster), but even when the 
greater length of the 18S gene is taken into con- 
sideration, rbcL still exhibits approximately 1.4 
times as many variable sites as 18S rDNA. The 
distribution of variable sites for the two molecules 
is also quite different. When the number of steps 
from one of the equally most parsimonious rbcL 
and 18S rDNA cladograms (e.g., Figs. 6 and 8, 
respectively) is plotted against site, the different 
variability patterns are graphically illustrated (Figs. 
3 and 4). For rbcL, sites in general are more 
variable, and, although certain regions clearly are 
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more variable than others, this variability appears 
more evenly distributed over the entire length of 
the molecule (Fig. 3) than for 18S (Fig. 4). That 
is, 18S rDNA shows highly variable regions inter- 
spersed with regions of extreme conservation (Fig. 
4). Significantly, the variable domains indicated on 
the secondary structure of Glycine (V1-V9, Fig. 
1) can be readily identified in Figure 4. The sec- 
ondary structural study conducted by Senecoff & 
Meagher (1992) used dimethyl sulfate to modify 
(and thereby identify) adenine and cytosine resi- 
dues of single-stranded portions of the soybean 18S 
rRNA molecule. Their data largely confirm the 
higher-order structure shown in Figure 1, especial- 
ly for variable regions 1 and 4. 

Pairwise 18S rDNA sequence comparisons with- 
in the flowering plants examined here indicated 
that most angiosperms differ from the above noted 
Glycine sequence at only 1-5% of the sites. Higher 
than average rates (numbers) of nucleotide substi- 
tution in 18S rDNA can be seen, however, in 
certain parasitic plants such as members of Vis- 
caceae and Cuscutaceae (included in the present 
study) and Balanophoraceae, Hydnoraceae, and 
Rafflesiaceae (Nickrent & Starr, 1994). Repre- 
sentatives of the latter three families were not in- 
cluded herein because they apparently lack an rbcL 
gene (Nickrent & dePamphilis, unpublished data). 
The causes of such elevated rates of 18S sequence 
evolution are currently under investigation by the 
first author. 

For both data sets, transitions outnumber trans- 
versions by approximately a factor of two. For the 
rbcL data set, there were 1520 unambiguous tran- 
sitions and 862 unambiguous transversions; for 
18S rDNA, there were 1099 and 574 unambig- 
uous transitions and transversions, respectively. The 
specific types of mutational events for the two 
molecules are also very similar, differing mainly in 
the frequency of the A to G transition (14.6% of 
the total changes for rbcL, 5.3% of the total for 
18S rDNA). Steps calculated over the rbcL tree 
by codon position demonstrate that most changes 
occur, as expected, in the third position followed 
by first and second position. 


PHYLOGENETIC ANALYSES 


rbcL. The heuristic search of the rbcL, data ma- 
trix yielded 12 most parsimonious trees, all in one 
island, of length 3090 with a consistency index 
excluding uninformative substitutions (C.I.—) of 
0.284 and retention index (R.I.) of 0.467. The strict 
consensus tree is illustrated in Figure 5. The main 
differences among the 12 most parsimonious trees 
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Variability histogram for rbcL. The number of steps (y axis) were determined from rbcL tree number 


3 (of 12 equally parsimonious trees) as shown in Figure 6 with the interval widths (x axis) set to four base pairs. 
Variability is distributed relatively evenly over the 1431 sites of the molecule. 


were the relationships among members of Santal- 
ales and in the relative position of Lambertia to 
nonpaleoherb dicots. The strict consensus tree 
shares a number of features with the rbcL refer- 
ence tree (Fig. 2). Using Zamia as the outgroup, 
the two representatives of Gnetales (Ephedra and 
Gnetum) form a monophyletic group strongly sup- 
ported by 66 synapomorphies that is sister to the 
angiosperms (Fig. 6). The angiosperms form a 
monophyletic group united by 39 base substitutions 
(bootstrap value of 93%). Within the angiosperms, 
the monocots examined (Zea, Oryza, and Spar- 
ganium), with the exception of Acorus, are the 
sister group to all other angiosperms. This rela- 
tionship differs from the reference tree where the 
monocots, including Acorus, are monophyletic and 
are sister to the Magnoliales/Paleoherbs II group. 
In the present analysis, the Paleoherbs I group 
(Houttuynia, Peperomia, Asarum, Saruma, and 
Aristolochia) is disrupted by the inclusion of Aco- 


rus and Drimys (Magnoliales). The Ranunculids 
(Akebia, Mahonia, and Ranunculus) have the 
same composition and general topology as seen in 
the reference tree. This latter clade is part of a 
trichotomy in the strict consensus tree of Figure 
5 that also comprises Lambertia (the single rep- 
resentative of the Hamamelid I group) and the 
remaining dicots. The present analysis of rbcL 
sequences does not include the closest relatives of 
Lambertia (i.e., Sabia, Nelumbo, Platanus) as 
determined in the Chase et al. analysis. Pachy- 
sandra, representing Hamamelid II, occupies a 
similar position on the strict consensus and refer- 
ence trees. Several of the remaining clades have 
taxon compositions identical to those of the ref- 
erence tree, although the topologies within these 
clades are not necessarily identical to those of the 
reference tree. These clades of identical compo- 
sition include Asterid III (Pyrola, Polemonium, 
and Impatiens), Rosid II (Brassica, Tropaeolum, 
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number 5 (of 26 equally parsimonious trees) as shown in Figure 8 with the interval widths set to four base pairs. 
Regions of variability, generally concentrated in the variable domains (Fig. 1), are interspersed with extremely 
conserved sites over the 1853 total sites for the molecule. 


and Gossypium), Santalales, Asterid II (Pittos- 
porum, Hedera, and Hydrocotyle), and Asterid I 
(Lycoperiscon and Convolvulus). The members of 
Asterid IV (Hydrangea, Cornus, and Nyssa) do 
not form a monophyletic group herein, but do 
appear near each other at the base of the Asterid 
I and II groups. The sole representative of the 
Caryophyllids, Spinacia, was positioned within the 
Rosid I clade as opposed to sister to Santalales on 
the reference tree. The position of Caryophyllids 
differed also in the two searches conducted by 
Chase et al. (1993). 

In our 62-taxa rbcL analysis, the Rosid I and 
Rosid III clades are nearly identical in composition 


FIGURE 5. 


to those of Chase et al. (1993). In our strict con- 
sensus tree, all taxa of the Rosid I clade, with the 
exception of Francoa and the addition of Spinacia, 
form a monophyletic group (compare Figs. 2 and 
5). However, the omission of Francoa from this 
clade again likely reflects taxon density. The closest 
relatives of Francoa based on the Chase et al. 
(1993) analysis (Greyia, Viviana, Wendtia) were 
not included herein. With the exception of Paeon- 
ia, Rosid III also appears as a monophyletic group 
in our analysis of rbcL sequences. One should 
regard this difference with great caution given that 
the position of Paeonia shifts dramatically between 


the two searches of Chase et al. (1993). The San- 
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The strict consensus tree of 12 equally most parsimonious cladograms derived from a heuristic search 


of the 62-taxon rbcL matrix; tree length = 3104, C.I. minus uninformative sites = 0.283, R.I. = 0.463. Groups 
whose compositions are identical to those of the reference tree (Fig. 2) are indicated by solid braces. Groups that 
appear para- and polyphyletic (relative to the reference tree) are indicated by dashed braces. The positions of underlined 
taxa (Acorus and Francoa) differ significantly from those expected from the reference tree (monocots and Rosid I, 
respectively). 
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talales form a monophyletic group in the present 
analysis, and their position here as sister to the 
Asterid clades is similar to the results of Search I 
of Chase et al., where the Santalales (represented 
in Chase et al. only by Phoradendron, Schoepfia, 
and Osyris), along with Gunnera, appear as the 
sister to all other asterids. In contrast, in Search 
II of Chase et al., the Santalales appear as sister 
to a clade containing the Caryophyllids. In the 
detailed analysis of rbcL sequences of asterids con- 
ducted by Olmstead et al. (1993), Santalales are 
not a component of Asteridae s.l. and they are 
likely members of a broadly defined rosid clade. 
Bootstrap values and branch lengths of the 62- 
taxa rbcL tree (Fig. 6) suggest the presence of 
several strongly supported major clades within the 
angiosperms. The monocots, Paleoherbs, and Mag- 
noliales (represented by Drimys) appear as the 
sister to the remainder of the angiosperms, which 
are supported as a monophyletic group by a high 
bootstrap value (92%). This large clade comprises 
most of the taxa included in the corresponding 
clade of Figure 2 and represents the “eudicots,” 
which have triaperturate or triaperturate-derived 
pollen (Donoghue & Doyle, 1989; Chase et al., 
1993; Qiu et al., 1993). Within this large eudicot 
clade, the Ranunculids, Lambertia, and Pachy- 
sandra appear as the sister to another large, strongly 
supported clade (bootstrap value of 91%). There 
are, however, few strongly supported subclades 
within this large clade. Subclades that received 
moderate to strong support (bootstraps of 70-80%) 
include the Asterid I, II, III, and IV, Rosid II, 
Rosid III minus Paeonia, and Santalales. Within 
the Santalales, the monophyly of the mistletoe fam- 
ily, Viscaceae, is supported by a bootstrap value 


of 86%. 


18S-rDNA.  Cladistic analysis of the 18S rDNA 
matrix yielded 26 equally parsimonious trees of 
length 2021, all in one island. Each of these trees 
had a C.I.— of 0.301 and an R.I. of 0.440. The 
strict consensus tree (Fig. 7) reveals that the Gne- 
tales again appear as the sister to the angiosperms 
and that the angiosperms form a well-supported 
monophyletic group (bootstrap value of 100%, Fig. 
8). Most clades in the 18S consensus tree are 
derived from a large polytomy, whereas the rbcL 


— 
FIGURE 6. 


consensus tree (Fig. 5) displays considerable res- 
olution. In the 18S analysis, the monocots Zea, 
Oryza, and Sparganium (minus Acorus) form a 
monophyletic group (bootstrap value of 91%) as 
does each of the following: Ranunculids, Rosid III, 
Santalales, Asterid I, and Asterid II. 

A bootstrap analysis of the 18S rDNA data set 
(Fig. 8) indicates that Acorus and then Nymphaea 
are the sisters to a large clade containing the re- 
maining angiosperms; however, these relationships 
received only weak support (bootstrap values less 
than 50%). A group of paleoherbs (Asarum, Sar- 
uma, Aristolochia, Houttuynia, and Peperomia) 
then appears as the sister to all remaining taxa. 
The large remaining clade corresponds, with one 
exception, to the “eudicots,” as defined by Chase 
et al. (1993). The main discrepancy in the com- 
position of eudicots between the rbcL tree and the 
185 rDNA tree pertains to the relationships of 
Drimys: in contrast to the rbcL tree, the 18S tree 
places Drimys in the eudicot clade as sister to 
Glycine. The 18S eudicot clade is defined by only 
six base substitutions (Fig. 8) and is not present in 
the strict consensus tree (Fig. 7), whereas in the 
rbcL analysis, 22 base substitutions support this 
clade, and the bootstrap value is high (92%) (Fig. 
6). 

Despite the poorer resolution of the 18S than 
the rbcL tree, several subclades appear in both 
analyses. For example, the three genera of Aris- 
tolochiaceae (Asarum, Saruma, Aristolochia) form 
a monophyletic group, but the association with 
Houttuynia and Peperomia, the other subclade of 
the Paleoherb I group of Chase et al. (1993), is 
only weakly supported (bootstrap value less than 
50%). The genera of Ranunculids (Akebia, Ma- 
honia/Podophyllum, and Ranunculus) also form 
a monophyletic group in both trees, although the 
position of this clade is different. 

Relationships within and among the several rosid 
clades show similarities in the 18S rDNA and rbcL 
trees, as well as several marked differences. 18S 
rDNA sequence data corroborate the results of 
rbcL sequence analysis in suggesting close rela- 
tionships between some members of the Rosid I 
clade (Fig. 8, Alnus (representing Betulaceae), Mo- 
rus, Prunus, Francoa, and Malpighia (repre- 
senting Malpighiaceae)). In both the 18S and rbcL 


One of the 12 equally most parsimonious phylograms (cladograms that show branch lengths) derived 


from the heuristic search of the 62-taxon rbcL matrix; tree length 3090, C.I. minus uninformative sites = 0.284, 
R.I. = 0.467. Numbers above the branches indicate branch lengths (i.e., number of steps or nucleotide substitutions). 
Numbers below the branches indicate the percentage of trees (from 100 bootstrap replications) that support that 
node. Branches without bootstrap percentages were found in less than 50% of the trees. 
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FIGURE 7. The strict consensus tree of 26 equally most parsimonious cladograms derived from a heuristic search 
of the 62-taxon 18S rDNA matrix; tree length = 2118, C.I. minus uninformative sites = 0.285, R.I. = 0.396. 
Groups whose compositions are identical to those of the reference tree (Fig. 2) are indicated by solid braces. Groups 
that appear para- and polyphyletic (relative to the reference tree) are indicated by dashed braces. The positions of 
underlined taxa (Glycine and Impatiens) differ significantly from those expected from the reference tree (Rosid I 
and Asterid III, respectively). 
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FIGURE 8. One of the 26 equally most parsimonious phylograms derived from the heuristic search of the 62- 
taxon 18S rDNA matrix; tree length 2021, C.I. minus uninformative sites = 0.301, R.I. = 0.440. Numbers above 
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sequence analysis, a close relationship is apparent 
between Lepuropetalon and Euonymus, and also 
Morus and Betulaceae. However, the position of 
Linaceae (represented by Reinwardtia in the rbcL 
analysis and by Linum in the 18S analysis) differs 
markedly between the 18S and rbcL trees. Rein- 
wardtia is part of the Rosid I clade in the rbcL 
tree, whereas Linum is only distantly related to 
other Rosid I taxa in the 18S trees. Similarly, the 
placement of Fabaceae differs in the two analyses 
with Pisum appearing with other Rosid I taxa in 
the rbcL tree, but with Glycine appearing as the 
sister of Drimys in the 18S tree. 

Both 18S and rbcL sequence data suggest a 
close relationship between Brassica and Tropaeo- 
lum (Rosid II; see Rodman et al., 1993). However, 
the placement of Gossypium differs markedly in 
the two trees. In the rbcL tree (Fig. 5), Gossypium 
is the sister of Brassica and Tropaeolum; all three 
genera are part of the Rosid II clade of Chase et 
al. (1993). In contrast, Gossypium is the sister of 
Impatiens in the 18S tree (Fig. 8) and is well 
removed phylogenetically from other Rosid II taxa. 

Phylogenetic analyses of 18S and rbcL sequenc- 
es also agree in suggesting a close relationship 
among Chrysosplenium, Heuchera, and Ribes, 
members of the Rosid III clade of Chase et al. 
(1993). The 18S analysis also places Paeonia in 
this clade, as does one of the two searches of Chase 
et al. (1993). Both analyses also concur in rec- 
ognizing a well-supported monophyletic Santalales, 
although the position of this large clade differs 
between the two analyses. In the rbcL tree, San- 
talales appear as the sister to members of Asteridae 
sensu lato (Olmstead et al., 1993), whereas in the 
18S analysis (Fig. 8) Santalales form the sister 
group of the Rosid III clade. 

Several of the relationships among Asterid taxa 
seen in the rbcL analysis are also found in the 
shortest 18S trees. For example, Lycopersicon and 
Convolvulaceae (represented by Convolvulus and 
Cuscuta in the rbcL and 18S analyses, respec- 
tively) are sister taxa in both analyses. These taxa 
represent the Asterid I group of Chase et al. (1993). 
Similarly, the Asterid II group of Pittosporum, 
Hedera, and Hydrocotyle form a monophyletic 
group in both analyses. The Asterid IV subclade 
(Chase et al., 1993) that includes Nyssa, Cornus, 
and Hydrangea (along with Gilia, Polemoniaceae) 
also forms a subclade in the 18S analysis. In the 
rbcL consensus tree, these three genera do not 
form a monophyletic clade but are closely allied 
basal members of an Asterid assemblage. 

In contrast to these similarities between the 18S 
and rbcL trees, the placement of those taxa rep- 


resenting the Asterid III clade (Chase et al., 1993) 
differs between the most parsimonious 18S and 
rbcL trees. In the 18S analysis (Fig. 8), Polemon- 
iaceae (represented by Gilia) and Monotropa ap- 
pear in clades with other Asterids. As previously 
mentioned, /mpatiens emerges as sister to Gos- 
sypium. In contrast, Polemonium, Pyrola (a genus 
closely allied with Monotropa, Kron & Chase, 
1993), and Impatiens form a subclade allied with 
the Rosid I clade in the shortest rbcL trees. This 
difference, however, may well reflect taxon density 
given that Polemonium and Pyrola are part of the 
Asteridae sensu lato when larger numbers of rbcL 
sequences are analyzed (Chase et al., 1993; Olm- 
stead et al., 1993). 

Several other placements and relationships differ 
dramatically between the 18S and rbcL trees. These 
include the phylogenetic positions of Buxus, Gos- 
sypium, Spinacia, and the sister-group relation- 
ship of Drimys and Glycine suggested by the 18S 
analysis. The same close relationship between Dri- 
mys and Glycine was also seen in the ribosomal 
RNA phylogenetic analysis of Hamby & Zimmer 
(1992). 


Relationships within Santalales. As noted 
above, we analyzed Santalales in more detail to 
compare the resolution of 18S and rbcL sequence 
data at lower taxonomic levels. The rbcL sequence 
data reveal the presence of three clades within the 
order: (1) Gaiadendron, Misodendron, Schoepfia, 
Opilia; (2) Antidaphne, Eubrachion, Osyris, San- 
talum; and (3) Arceuthobium, Dendrophthora, 
Phoradendron, Ginalloa, Korthalsella, Notho- 
thixos, and Viscum. The first group, minus Opilia, 
is strongly supported (bootstrap value of 88%) as 
are the second (82%) and third groups (86%). 
Analysis of 18S sequences reveals a very similar 
pattern of relationship. The Viscaceae form a 
monophyletic group (88% bootstrap value). Con- 
sidering group 2, Antidaphne, Eubrachion, and 
Santalum form a monophyletic clade, with Osyris 
as their sister. Lastly, Gaiadendron, Opilia, and 
Schoepfia form a subclade (minus Misodendron) 
that closely corresponds to the rbcL group 1. 


DISCUSSION 


The goal of this project was not to resolve higher- 
level relationships among the angiosperms, but 
rather to evaluate the phylogenetic potential of 
complete 18S rDNA sequences through a com- 
parison of molecular phylogenies derived from both 
rbcL and 18S rDNA using similar taxon sampling 
and identical density and familial representation. 
The enormous phylogenetic potential of rbcL se- 
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quences is now well documented by numerous stud- 
ies. In contrast, the phylogenetic utility of entire 
plant small-subunit ribosomal RNA sequences may 
have been underestimated. The recent study of 
Hamby & Zimmer (1992) certainly suggested that 
partial 18S, as well as 26S, sequences might help 
resolve the deepest branches of angiosperm phy- 
logeny. Nickrent & Franchina (1990) had previ- 
ously demonstrated that complete sequences of the 
18S region held considerable phylogenetic poten- 
tial. The present study further illustrates the phy- 
logenetic potential of entire 185 rDNA sequences. 

The present study indicates clearly that the rate 
of evolution of 18S rDNA is lower than that of 
rbcL. The percentage of sites that are potentially 
phylogenetically informative is almost twice as high 
for rbcL as for 18S rDNA (33.6% vs. 18.4%). 
However, because the 185 region is almost 400 
bp longer than rbcL, the ratio of the number of 
phylogenetically informative sites per molecule is 
only about 1.4 times greater for rbcL compared 
to 18S rDNA. Thus, the amount of variation af- 
forded per molecule is more comparable than sug- 
gested by rate of evolution alone. Because the 
number of variable sites in 188 rDNA is lower than 
for rbcL, complete sequencing of the entire 185 
region becomes more critical for phylogenetic in- 
ference. Not only does this approach maximize the 
number of variable sites, but complete sequencing 
concomitantly facilitates proper alignment of 185 
sequences. The two molecules also differ greatly 
in terms of the distribution of variation along each 
respective DNA region (Figs. 3 and 4). That is, 
base substitutions are spread much more evenly 
across the entire length of rbcL than for 185 
rDNA. 

Considerable variation in the evolutionary rate 
of rbcL has been shown within the angiosperms 
(Wilson et al., 1990; Bousquet et al., 1992; Chase 
et al., 1993). Although lineage rate asymmetry 
can contribute to spurious branch attractions (Hen- 
dy & Penny, 1989; Albert et al., 1993), it may 
not be extensive enough between angiosperm lin- 
eages to be problematic in terms of phylogenetic 
reconstruction given sufficient taxon density (Chase 
et al., 1993). The extent of heterogeneity of evo- 
lutionary rates among most plant lineages for 185 
rDNA is not yet known. Unequal rates of 185 
rDNA sequence evolution are suggested, however, 
for some Santalales and other parasitic plants, which 
exhibit an accelerated rate of evolution compared 
to other angiosperms (Nickrent & Franchina, 1990; 
Nickrent & Starr, 1994). The number of nucle- 
otide substitutions per site (K) in pairwise com- 
parisons among five nonparasitic angiosperms av- 
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eraged 0.036 for 18S rDNA (Nickrent & Starr, 
1994). In contrast, pairwise comparisons using an 
obligate hemiparasite (Arceuthobium) and several 
holoparasites (Prosopanche, Balanophora, Raf- 


flesia, and Rhizanthes) result in a mean K value 


of 0.115 (Nickrent & Starr, 1994). Investigations 
of other heterotrophic angiosperms such as Phol- 
isma (Lennoaceae) and Cuscuta (Convolvulaceae) 
have revealed similarly high substitution rates 
(Nickrent & Colwell, 1994). Accelerated substi- 
tution rates may also be present in Lepuropetalon 
and Peperomia based on the very long branch 
lengths these taxa exhibit (56 and 46, respectively). 
These long branch lengths could, however, simply 
be an artifact of the low taxon density of this 
analysis. It is noteworthy that Peperomia and Le- 
puropetalon also have much longer branch lengths 
than do their sister taxa in the rbcL tree depicted 
herein (Fig. 6), but this was not the case in the 
larger analysis of Chase et al. (1993) in which 
closer relatives of these taxa were included. Re- 
gardless of the cause of the long branch lengths in 
Peperomia and Lepuropetalon, the phylogenetic 
position of these taxa is similar in both the 185 
and rbcL trees shown herein. 

To evaluate the phylogenetic potential of 185 
rDNA sequences, it is also important to elucidate 
the impact of secondary structure of the 185 rRNA 
transcript on phylogenetic reconstruction. As re- 
viewed recently (Dixon & Hillis, 1993), major 
questions remain regarding phylogenetic analysis 
of rRNA or rDNA data. These questions include: 
should loop bases (non-pairing bases) and stem bas- 
es (pairing bases) both be used in phylogenetic 
reconstruction and, if so, should bases from each 
class (stems and loops) be considered equally in- 
formative and independent? Wheeler & Honeycutt 
(1988) recommended that stem base nucleotides 
be eliminated from phylogenetic analyses, or 
weighted by one-half. In contrast, in a detailed 
analysis of 28S rRNA genes from vertebrates, Dix- 
on & Hillis (1993) found that characters from both 
stems and loops contain phylogenetic information. 
In addition, they found that stem bases sustain a 
greater number of compensatory mutations than 
would be expected at random, but the number of 
such mutations was less than 40% of that expected 
under a hypothesis of perfect compensation to 
maintain secondary structure. Dixon and Hillis 
therefore suggested that the weighting of stem 
characters be reduced by no more than 20% rel- 
ative to loop characters in phylogenetic analyses. 
In an analysis of 18S rRNA sequences from echi- 
noderms, Smith (1989) similarly reported that 
paired nucleotides were phylogenetically informa- 
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tive. Although the methods are at present not fully 
developed, incorporation of information from rRNA 
secondary (and tertiary) structure in phylogeny 
reconstruction algorithms is taking place (Van de 
Peer et al., 1993). These issues will require more 
attention in future phylogenetic studies of plants 
that use rDNA. 

Our comparison of a similar suite of 62 taxa for 
both rbcL and 18S rDNA sequences yielded phy- 
logenetic trees with a number of similar features, 
although we emphasize again that these trees should 
not be viewed as rigorous phylogenetic hypotheses 
for angiosperms. Both analyses revealed a well- 
supported monophyletic Gnetales as sister to a 
monophyletic Magnoliophyta, a result not too sur- 
prising given the sampling of taxa used. Within 
the angiosperms, both analyses revealed a mono- 
phyletic group of monocots (Zea, Oryza, Spar- 
ganium) that did not include Acorus as sister to 
other angiosperms. The distinctiveness of Acorus 
within the monocots was recently emphasized by 
Duvall et al. (1993). In both analyses, Nymphaea 
occurred in a similar position as sister to all other 
dicots. Both 18S rDNA and rbcL analyses rec- 
ognized several identical clades, including two groups 
of Paleoherbs (Houttuynia, Peperomia; and As- 
arum, Saruma, Aristolochia), Ranunculids (Ak- 
ebia, Berberidaceae, Ranunculus), several groups 
of Rosids (Brassica, Tropaeolum; Morus, Betu- 
laceae; Lepuropetalon, Euonymus; Chrysosplen- 
ium, Heuchera, Ribes), Santalales, and several 
groups of Asterids (Pittosporum, Hedera, Hydro- 
cotyle; Lycopersicon, Convolvulaceae). On a 
broader scale, very similar patterns of relationship 
are suggested among many of the Rosids and As- 
terids. Furthermore, both analyses suggest the 
presence of a large eudicot clade. At a lower tax- 
onomic level, nearly identical subclades were re- 
vealed within the Santalales by both 18S rDNA 
and rbcL sequences. The degree of resolution 
achieved within Santalales using 18S rDNA se- 
quences may, in part, reflect the accelerated rate 
of evolution of this region in this group of plants 
(Nickrent & Franchina, 1990; Nickrent & Starr, 
1994). 

The fact that phylogenetic analysis of 18S rDNA 
sequences for 62 taxa reveals relationships within 
angiosperms very similar to those obtained for a 
similar suite of taxa using rbcL sequences strongly 
suggests that questions of higher-level phylogeny 
in the angiosperms, as well as in seed plants in 
general, can be addressed with entire 18S rDNA 
sequences. The differences between the 18S and 
rbcL trees compared herein could, in large part, 
reflect taxon density, and also the use of different 


genera to represent some families (e.g., Polemon- 
iaceae, Buxaceae, Convolvulaceae). Furthermore, 
the differences between the phylogenetic relation- 
ships gleaned from rbcL and 18S rDNA data may 
derive from their being, respectively, plastid and 
nuclear-encoded gene trees, neither of which per- 
fectly represents the true species tree. Our analyses 
reinforce the findings of others (e.g., Nickrent & 
Franchina, 1990; Martin & Dowd, 1991; Hamby 
& Zimmer, 1992; Conran & Dowd, 1993; Hoot 
et al., 1995, this issue) in suggesting that sequenc- 
ing of the 18S rDNA region holds considerable 
phylogenetic potential. 

Although comparative sequencing of the entire 
18S rDNA region holds potential for inferring phy- 
logeny, we stress that this nuclear region will almost 
certainly not elucidate familial and generic level 
relationships to the extent possible with rbcL se- 
quences simply because of the slower rate of evo- 
lution and lower overall number of base substitu- 
tions of 18S rDNA compared to rbcL. Whereas 
comparative rbcL sequencing has been used to 
resolve relationships within some angiosperm and 
gymnosperm families, including Onagraceae (Conti 
et al., 1993), Rosaceae (Morgan et al., 1994), 
Saxifragaceae s.s. (Morgan & Soltis, 1993), Tax- 
odiaceae (Brunsfeld et al., 1994), Cupressaceae 
(Gadek & Quin, 1993), and Ericaceae (Kron & 
Chase, 1993), similar resolution with 18S sequenc- 
es seems unlikely. In some santalalean families such 
as Viscaceae, 18S rDNA sequences have resolved 
generic-level relationships in a fashion comparable 
to that achieved via comparative rbcL sequencing. 
However, it is likely that the ability to resolve 
subfamilial and generic relationships in Santalales 
with 18S sequence data was facilitated by the high- 
er substitution rate for this region in these taxa. 
Nonetheless, these results for Santalales illustrate 
that, in some instances, 18S sequence variation 
can be useful within families. Concomitantly, these 
findings also indicate that the ability of 18S rDNA 
sequences to provide sufficient resolution within 
any particular order or family must be determined 
empirically, just as for rbcL. 

This study suggests that comparative sequencing 
of the 18S region should prove most useful for 
addressing phylogenetic relationships at the family 
level and above. Our results parallel those of Hoot 
et al. (1995) who showed in an analysis of Lar- 
dizabalaceae and other ranunculids that 18S rDNA 
sequences are more conserved than the two chlo- 
roplast genes employed (rbcL and atpB), but were 
useful in resolving relationships above the level of 
family. 18S rDNA sequence variation may be par- 
ticularly well suited for addressing deeper phylo- 
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genetic branches within the angiosperms and in 
seed plants in general. The present study certainly 
indicates that additional sequencing of the entire 
18S rDNA region is justified to obtain a broad 
sampling of angiosperms and other seed plants for 
eventual comparison with rbcL-based tree topolo- 
gies (e.g., Chase et al., 1993). At present, only 
approximately 150 complete angiosperm 185 rDNA 
sequences exist (compared to over 1500 rbcL se- 
quences for angiosperms). Further 18S rDNA se- 
quencing within the monocots, Magnoliidae, Car- 
yophyllidae, Hamamelidae, and Dilleniidae (sensu 
Cronquist) is especially needed to achieve greater 
taxon density for the angiosperms. 
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